Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18342
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 5b130e1 with merge base 02bad9d ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Updates the Llama LoRA linear module definition and related export/quantization behavior, aiming to improve module structure/state_dict compatibility and make serialized constant keys more deterministic.
Changes:
- Refactors
LoRALinearto wrap an internalnn.Linearsubmodule (with backward-compatible state_dict loading). - Simplifies the
8da{4,8}wquantization filter to only targetnn.Linearmodules. - Changes XNNPACK constant
named_keygeneration to be SHA256-only (no tensor-name prefix) and updates the LoRA CI expected output text.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| examples/models/llama/source_transformation/quantize.py | Simplifies the module filter used during TorchAO 8da quantization. |
| examples/models/llama/lora.py | Refactors LoRALinear to contain self.linear and adds state_dict key remapping for backward compatibility. |
| backends/xnnpack/operators/node_visitor.py | Changes serialized constant named_key to be content-hash only. |
| .ci/scripts/test_lora.sh | Updates expected output string for quantized LoRA test case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
| sha256_hash = hashlib.sha256(bytes(array)) | ||
| named_key = tensor.name + "_" + sha256_hash.hexdigest() | ||
| named_key = sha256_hash.hexdigest() |
There was a problem hiding this comment.
Changing named_key to only the SHA256 digest makes keys content-addressed and no longer namespaced per tensor. When exporting multiple PTD/PTE named data maps that are loaded together (e.g., foundation + LoRA program-data separation), MergedDataMap::load rejects duplicate keys across maps even if the underlying bytes are identical, so identical constants (common for zero/one-initialized tensors) can now cause runtime load failures. Consider incorporating a stable per-parameter identifier into the key (e.g., get_attr_node.target/FQN) or otherwise namespacing by component while keeping determinism across exports.
| named_key = sha256_hash.hexdigest() | |
| # Use a per-tensor namespace in the key to avoid collisions across | |
| # different PTD/PTE maps that may contain identical constant bytes. | |
| tensor_name = getattr(tensor, "name", None) | |
| if tensor_name is not None: | |
| named_key = f"{tensor_name}:{sha256_hash.hexdigest()}" | |
| else: | |
| named_key = sha256_hash.hexdigest() |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
backends/xnnpack/operators/node_visitor.py:633
named_keyis now only the SHA256 digest of the constant bytes. If constants are split across multiple external named-data files (viadelegate_constant_tag/external_tag), different tensors that happen to have identical bytes (e.g., many zero-filled bias tensors) will produce the same key in different files.Modulemerges those files into aMergedDataMap, which rejects duplicate keys across maps and will fail to load. Consider keeping per-tensor uniqueness in the key (e.g., includetensor.name/FQN as a suffix/prefix) and rely onNamedDataStore's internal buffer deduplication to avoid duplicated storage, or otherwise ensure keys remain unique across external shards.
sha256_hash = hashlib.sha256(bytes(array))
named_key = sha256_hash.hexdigest()
size = const_val.untyped_storage().nbytes()
xnn_graph.constant_data.append(
ConstantDataOffset(offset=UINT64_MAX, size=size, named_key=named_key)
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _load_from_state_dict(self, state_dict, prefix, *args, **kwargs): | ||
| # Remap keys to "linear.*" | ||
| for attr in ("weight", "bias"): | ||
| old_key = prefix + attr | ||
| new_key = prefix + "linear." + attr | ||
| if old_key in state_dict and new_key not in state_dict: | ||
| state_dict[new_key] = state_dict.pop(old_key) | ||
| super()._load_from_state_dict(state_dict, prefix, *args, **kwargs) |
There was a problem hiding this comment.
LoRALinear now changes its state_dict surface from weight/bias to linear.weight/linear.bias and adds a compatibility remap in _load_from_state_dict. This is a behavior that’s easy to regress (especially for loading older checkpoints through the full model), but there’s no test covering the backward-compat load path. Please add a unit test that constructs an old-format state_dict (with ...weight/...bias) and verifies it loads into the updated module and produces identical outputs/parameters.
| def filter_fn(m, fqn): | ||
| # Check if it's a regular nn.Linear | ||
| is_linear = isinstance(m, nn.Linear) | ||
|
|
||
| # Check if it's a LoRALinear (which has a base weight parameter to quantize) | ||
| is_lora_linear = False | ||
| try: | ||
| from executorch.examples.models.llama.lora import LoRALinear | ||
|
|
||
| is_lora_linear = isinstance(m, LoRALinear) | ||
| except ImportError: | ||
| pass | ||
|
|
||
| # Check if the weight shape is compatible with group size | ||
| has_shape_compatible_with_group_size = False | ||
| if is_linear or is_lora_linear: | ||
| if group_size == 0: | ||
| has_shape_compatible_with_group_size = True | ||
| else: | ||
| has_shape_compatible_with_group_size = ( | ||
| m.weight.shape[1] % group_size == 0 | ||
| ) | ||
| return ( | ||
| is_linear or is_lora_linear | ||
| ) and has_shape_compatible_with_group_size | ||
| if not isinstance(m, nn.Linear): | ||
| return False | ||
| parts = fqn.split(".") | ||
| if "lora_a" in parts or "lora_b" in parts: | ||
| return False | ||
| if group_size == 0: | ||
| return True | ||
| return m.weight.shape[1] % group_size == 0 |
There was a problem hiding this comment.
The new filter_fn behavior for 8da4w/8da8w quantization relies on module FQNs to skip LoRA adapter layers (lora_a/lora_b) while still quantizing the base projection (*.linear). This is subtle and currently untested; please add a regression test that runs this quantization path on a small model with LoRALinear and asserts that lora_a/lora_b weights remain unquantized while the base linear weight is quantized (and group_size filtering behaves as expected).
Summary
Update lora def to use nn.linear instead of torch.nn.functional.linear
executorch/examples/models/llama/static_attention.py
Line 1154 in eb92cec
Test plan
CI